A Unified Framework For Automatic Evaluation Using 4-Gram Co-occurrence Statistics
نویسندگان
چکیده
In this paper we propose a unified framework for automatic evaluation of NLP applications using N-gram co-occurrence statistics. The automatic evaluation metrics proposed to date for Machine Translation and Automatic Summarization are particular instances from the family of metrics we propose. We show that different members of the same family of metrics explain best the variations obtained with human evaluations, according to the application being evaluated (Machine Translation, Automatic Summarization, and Automatic Question Answering) and the evaluation guidelines used by humans for evaluating such applications.
منابع مشابه
Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics
Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram cooccurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct a...
متن کاملAutomatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics
Evaluation is recognized as an extremely helpful forcing function in Human Language Technology R&D. Unfortunately, evaluation has not been a very powerful tool in machine translation (MT) research because it requires human judgments and is thus expensive and time-consuming and not easily factored into the MT research agenda. However, at the July 2001 TIDES PI meeting in Philadelphia, IBM descri...
متن کاملAutomatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics
Evaluation is recognized as an extremely helpful forcing function in Human Language Technology R&D. Unfortunately, evaluation has not been a very powerful tool in machine translation (MT) research because it requires human judgments and is thus expensive and time-consuming and not easily factored into the MT research agenda. However, at the July 2001 TIDES PI meeting in Philadelphia, IBM descri...
متن کاملAutomatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics
In this paper we describe two new objective automatic evaluation methods for machine translation. The first method is based on longest common subsequence between a candidate translation and a set of reference translations. Longest common subsequence takes into account sentence level structure similarity naturally and identifies longest co-occurring insequence n-grams automatically. The second m...
متن کاملAutomatic summary assessment for intelligent tutoring systems
Summary writing is an important part of many English Language Examinations. As grading students’ summary writings is a very time-consuming task, computer-assisted assessment will help teachers carry out the grading more effectively. Several techniques such as Latent Semantic Analysis (LSA), n-gram co-occurrence and BLEU have been proposed to support automatic evaluation of summaries. However, t...
متن کامل